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ABSTRACT 

Symbolic execution techniques have been proposed recently for the 
probabilistic analysis of programs. These techniques seek to quan- 
tify the likelihood of reaching program events of interest, e.g., as- 
sert violations. They have many promising applications but have 
scalability issues due to high computational demand. To address 
this challenge, we propose a statistical symbolic execution tech- 
nique that performs Monte Carlo sampling of the symbolic program 
paths and uses the obtained information for Bayesian estimation 
and hypothesis testing with respect to the probability of reaching 
the target events. To speed up the convergence of the statistical 
analysis, we propose Informed Sampling, an iterative symbolic ex- 
ecution that first explores the paths that have high statistical signif- 
icance, prunes them from the state space and guides the execution 
towards less likely paths. The technique combines Bayesian esti- 
mation with a partial exact analysis for the pruned paths leading to 
provably improved convergence of the statistical analysis. 

We have implemented statistical symbolic execution with in- 
formed sampling in the Symbolic PathFinder tool. We show exper- 
imentally that the informed sampling obtains more precise results 
and converges faster than a purely statistical analysis and may also 
be more efficient than an exact symbolic analysis. When the latter 
does not terminate symbolic execution with informed sampling can 
give meaningful results under the same time and memory limits. 

Categories and Subject Descriptors 

D.2.4 [Software Engineering]: Software/Program Verification — 
Model checking, Reliability, Statistical methods 

1. INTRODUCTION 

Several techniques have been proposed recently for the proba- 
bilistic analysis of programs [0, El DU]. These techniques have 
multiple applications, ranging from program understanding and de- 
bugging to computing reliability of software operating in uncertain 
environments. 

For example, in previous work [0, IQ], we described a bounded 
symbolic execution of a program that uses a quantification proce- 
dure over the collected symbolic constraints to compute the counts 
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of the inputs that follow the explored program paths. These counts 
are then used to compute the probability of executing different 
paths through the program (or of violating program assertions), un- 
der given probabilistic usage profiles. While promising, these exact 
techniques have scalability issues due to the large number of sym- 
bolic paths to be explored. 

To address this problem we describe a statistical symbolic ex- 
ecution technique that uses randomized sampling of the symbolic 
paths. For deciding termination of sampling we investigate two dif- 
ferent criteria: Bayesian estimation and hypothesis testing [O, ESQ. 
The first is used to estimate the probability of executing designated 
program paths while the latter is used to test a given hypothesis 
about such probability. Unlike in a typical statistical setting where 
one samples randomly across a concrete input domain, our samples 
are done in the context of symbolic execution, according to condi- 
tional probabilities computed at each branching point in the pro- 
gram. This approach is similar to statistical model checking [E3, 
ED, with the difference that we work with code not with models 
and we sample symbolic paths, where the probabilistic information 
is computed based on the collected symbolic constraints. 

When using Bayesian estimation, the randomized sampling ter- 
minates when pre-specified confidence and error bounds (accuracy) 
have been achieved. The answer to the analysis problem is not 
guaranteed to be correct, but the probability of a wrong answer can 
be made arbitrarily small [03]. However, in practice, the conver- 
gence to an answer might be very slow. Hypothesis testing can be 
faster m, but both techniques may require a very large number of 
sample paths to achieve the desired statistical confidence. 

To speed up both methods, we propose Informed Sampling (IS), 
an iterative technique combining statistical methods with partial ex- 
act analysis. At each iteration, IS randomly samples a set of exe- 
cution paths and performs a statistical analysis of the sample. The 
probability of sampling each path is proportional to the number of 
input points following it under the specified usage profile so not 
to bias the sample. If the statistical method converged, its result 
is returned. Otherwise the already sampled paths are pruned out 
from the execution tree and analyzed exactly. The next iteration 
will then focus on the analysis of only the remaining part of the 
execution tree, increasing also the chances of selecting low proba- 
bility paths that might have not been sampled (and pruned) during 
the previous iterations. 

For pruning the sampling space we propose an efficient proce- 
dure that leverages the counts of the inputs associated with each ex- 
plored symbolic path and subtracts them from the counts of all the 
prefixes along the path. The intuition is that, at the end of each iter- 
ation, the counts should keep track of the number of inputs that still 
need to be explored (sampled) for the execution to follow that path. 
The counts keep decreasing with each iteration and if a counter be- 


comes 0 it means that the sub-tree rooted at that node has been fully 
explored and can be safely pruned from the search space. 

For estimating the probability results we propose a combination 
of exact analysis (for the paths that are pruned in previous itera- 
tions) and Bayesian statistical analysis (for the paths sampled in 
the current iteration over the pruned state space). The analysis ter- 
minates when the pre-specified confidence and error bounds have 
been achieved (for Bayesian estimation) or when the hypothesis is 
confirmed (for hypothesis testing). The analysis may also terminate 
when all the paths have been explored, in which case the results will 
be the same as for the exact analysis. IS converges faster and re- 
quires fewer samples than the purely random sampling techniques, 
since the set of samples is different with each iteration and each 
pruned path set is analyzed exactly (with confidence 1). Further- 
more, the probability of finding the target program events increases 
with each iteration. 

The main focus of this work is on computing non-functional 
properties of programs, such as the probability of successful ter- 
mination (or conversely the probability of failure) under a given 
usage profile 03]. Flowever, statistical symbolic execution with IS 
can also be used for improving “classical” (non-probabilistic) sym- 
bolic execution, in the sense that, if symbolic execution runs out 
of resources (time, memory) the statistical techniques can be used 
to provide useful information with statistical guarantees. Note also 
that the statistical techniques provide an “any time” approach to 
software analysis: the longer they run, the better the results. 

We make the following contributions: 

• statistical symbolic execution with two stopping criteria 
(Bayesian estimation and hypothesis testing) and implemen- 
tation within the Symbolic PathFinder tool [E3|; 

• IS that converges faster than Monte Carlo sampling; 

• an efficient procedure for pruning the state space for incre- 
mental symbolic execution; 

• combined statistical and exact information for (1) Bayesian 
estimation and ( 2 ) hypothesis testing; 

• experimental evidence showing the improvement of IS over 
state-of-the-art statistical approaches. 

2. BACKGROUND 
2.1 Symbolic Execution 

Symbolic execution is an extension of normal execution in which 
the semantics of the basic operators of a language is extended to 
accept symbolic inputs and to produce symbolic formulas as out- 
put L1151J. The behavior of a program P is determined by the values 
of its inputs and can be described by means of a symbolic execu- 
tion tree where tree nodes are program states and tree edges are 
the program transitions as determined by the symbolic execution 
of program instructions. 

The state s of a program is defined by the tuple (IP. V, PC) where 
IP represents the next instruction to be executed, V is a mapping 
from each program variable v to its symbolic value (i.e., a sym- 
bolic expression in terms of the symbolic inputs), and PC is a path 
condition. PC is a conjunction of constraints over the symbolic 
inputs that characterizes exactly those inputs that follow the path 
from the program's initial state to state s. 

The current state s and the next instruction IP define the set of 
transitions from s. Without going into the details of every Java 
instruction, we informally define these transitions depending on the 
type of instruction pointed to by IP. 


Assignment. The execution of an assignment to variable v £ V 
leads to a new state where IP is incremented to point to the next 
instruction and V is updated to map v to its new symbolic value. 
PC does not change. 

Branch. The execution of an if-then-else instruction on condi- 
tion c introduces two new transitions. The first leads to the state 
where IP\ points to the first instruction of the then block and the 
path condition is updated to PC\ = PC A c. The second leads to a 
state S 2 where IP 2 points to the first instruction of the else block and 
the path condition is updated to PC 2 = PC A -1 c. If the path con- 
dition associated with a branch is not satisfiable, the new transition 
and state are not added to the symbolic execution tree. 

Loop. A while loop is unrolled until its condition evaluates to 
false or a pre-specified exploration depth limit is reached. Analo- 
gous transformations are applied to other loop constructs. 

The initial state of a program is so = (IPq,Vq,PCq), where IPq 
points to the first instruction of the main method, Vo maps the argu- 
ments of main (if any) to fresh symbolic values, and PCq = true. A 
program may also have one or more terminal states that represent 
conditions such as the successful termination of the program or an 
uncaught exception that aborts the program execution abruptly. 

Although our approach can be customized for any symbolic ex- 
ecution system, we focus on Symbolic PathFinder (SPF) [ED that 
works at the Java bytecode level. 


2.2 Probability Theory 

The possible outcomes of an experiment are called elementary 
events. For example, the rolling of a 6 -sided die may produce the 
elementary events 1, 2, 3, 4, 5, and 6 . Elementary events have to be 
atomic , i.e., the occurrence of one of them excludes the occurrence 
of any other. The set of all elementary events is called a sample 
space. In this paper, we consider only finite and countable sam- 
ple spaces, meaning that the underlying set of elementary events is 
countable and finite. 

Definition 1 (Probability distribution). Let LI be 
the sample space of an experiment. A probability distribution on 
LI is any function Pr : L^(Ll) —¥ [0, 1] C JR. that satisfies the follow- 
ing conditions (probability axioms): 

• Pr({x\) >0 for every elementary event x 

• Pr(Ll) = 1 

• Pr(A U B) = Pr(A) + Pr(B) for all events A.B with AHB = 0 

The pair ( Ll,Pr ) constitutes a probability space. 

Definition 2 (Conditional probability). Let (Ll,Pr) 

be a probability space. Let A and B be events ( A,B C £2), and 
let Pr(B) 7 ^ 0. The conditional probability of the event A given that 
the event B occurs, is: 


Pr(A\B) 


Pr(ACB) 

Pr(B) 


Pr(A\B) is also referred to as the probability of A given B. 


Definition 3 (Law of total probability). Let (Ll,Pr) 
be a probability space and {B n : n = 1,2,3 be a finite partition 
of LI. Then, for any event A: 


Pr(A) = Y,Pr(A\B i )-Pr(B i ) 

n 


The law of total probability can also be stated for conditional 
probabilities: 

Pr(A\X) = Y,Pr(A\XnBi)-Pr(Bj\X) 

n 


where Bj are defined as in Definition [3 and X does not invalidate 
the assumptions of Definition □. 


2.3 Probabilistic Analysis 

We follow previous work [0] where we defined a symbolic exe- 
cution framework for computing the probability of successful ter- 
mination (and alternatively the probability of failure) for a Java 
software component placed in a stochastic environment. A failure 
can be any reachable error, such as a failed assertion or an uncaught 
exception. For simplicity, we assume the satisfaction of target pro- 
gram properties to be characterized by the occurrence of a target 
event, but our work generalizes to bounded LTL properties [03]. 

To deal with loops, we run SPF using bounded symbolic exe- 
cution, i.e., a bound is set for the exploration depths. The result 
of symbolic execution is then a finite set of paths, each with a path 
condition. Some of these paths lead to failure, some of them to suc- 
cess (termination without failure) and some of them lead to neither 
success nor failure (they were interrupted because of the bounded 
exploration) - the latter are called grey paths. 

The path conditions produced by SPF consequently form three 
sets: PC S = {PC\,PC s 2 ,...,PC s m }, PC f = {PC{,PC f 2 , . . . ,PC f p } 
and PC S = {PC\,PC 2 , . . . ,PCq}, according to whether they lead 
to success, failure, or were truncated. Note that the path conditions 
are disjoint and cover the whole input domain. In other words, the 
three sets form a complete partition of the input domain [0, 122] . 

Not all input values are equally likely, and we employ a usage 
profile to characterize the interaction of the software and the envi- 
ronment. It maps each valid combination of inputs to the probabil- 
ity with which it may occur. In [0] we provide an extensive treat- 
ment of usage profiles and how they are used for the probabilistic 
computations. For simplicity, and without loss of generality, we 
will assume here that the usage profile is embedded in the code. 
This can be done with every usage profile where the probabilities 
Pi are described by arbitrary precision, rational numbers. More 
general usage profiles, such as Markov Chains, can be embedded 
as well; they are analyzed in a bounded way. 

Given the output of SPF, and assuming the constraints from the 
usage profile have been embedded in the code, the probability of 
success is defined as the probability of executing program P with 
an input satisfying any of the successful path conditions (recall the 
path conditions are disjoint): 

PrfiP) = £Pr(PCf) (1) 

i 

The failure probability Pr f (P) and “grey” probability Pr g (P) have 
analogous definitions; it is straightforward to prove that Pr s (P) + 
Prf (P) + Pr g (P) = 1. Pr g (P ) can be used to quantify the impact 
of the execution bound on the quality of the analysis (1 — Pr g (P)). 

In this paper we focus on sequential programs with integer in- 
puts. In other work we provide treatment of multi-threading UTS], 
input data structures [0], and floating-point inputs [0] (see Sec- 
tion 0). 

2.3.1 Quantification Procedure 

We compute the probabilities of path conditions using a quantifi- 
cation procedure (e.g., [0, 0 IHJ) for the path conditions. We use 
LattE [B] to count models for linear integer constraints but our work 
generalizes straightforwardly to other tools such as QCoral 03] (for 
arbitrary floating point constraints) and Korat |E0 (for heap data 
structures; see [0] for details). 

Given a finite integer domain D, model counting allows us to 
compute the number of elements of D that satisfy a given constraint 
c; we denote this number by (j (c) (a finite non-negative integer). By 
definition 0ZOO, Pr(c) is j](c)/jj(ZJ) (where (j (D) is the size of the 
domain implicitly assumed to be greater than zero). 

The success probability (or failure or grey probability) can then 


sq: true [to 9 ] 
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Figure 1: Partial symbolic execution tree of the example. 

be computed using model counting as follows. 

PrfiP) = £Pr(PCf) = (2) 

3. EXAMPLE 

In this section we illustrate the proposed techniques with a sim- 
ple example. Consider the code in Listing QJ Assume the goal is to 
estimate the success probability of the method test, i.e., the proba- 
bility of not reaching line 6, where an exception would be raised. 
Assume the input variables x, y, and z range over the integer do- 
main [1..1000], The size of the input domain is 10 3 • 10 3 • 10 3 = 10 9 
points. In practice the domains can be much larger. Note that the 
size of the domain does not affect the complexity of the counting 
procedure, which only depends on the number of input variables 
and the number of constraints 23, EB0. 

Listing 1 : Illustrative example 

1 void test (int x, int y, int z) { 

2 if (x<=50) { 

3 // Do some work 

4 } else { 

5 if (x==500 && y==500 && z==500) { 

6 assert false; 

7 } 

8 // Do more work 

9 } 

10 } 

Assuming a uniform usage profile, the probability of hitting the 
failure (assert false) is 10~ 9 , since there is only one point in the 
input domain that can lead to failure. 

To illustrate how sampling works, consider Figure EL where a 
part of the symbolic execution tree of Listing 0 is reported. For 
each branching point (represented as a node) we show both the path 
condition and the corresponding counter in square brackets. These 
counters are initially computed by LattE as the paths are explored, 
and stored for re-use. The first time a branch is encountered, the 
counters are used to compute the probability of each alternative, 
and a randomized choice is made accordingly (see Section 0). For 
example, the probability of moving from S 2 to 5-3 is 949/950; the 
number of points satisfying PC at J3 is 949 ■ 10 6 while the number 
of those satisfying PC at ,y 4 is 10 6 , which together sum up to the 
number of points in PC at the parent state s 2 . In our approach, a 
second simulation run would reuse this computation, making re- 
peated sampling efficient. 

Statistical symbolic execution with IS starts the first iteration by 
performing a small number of samples, as dictated by the proba- 
bilities of the branching conditions. Assume for simplicity that at 
the end of the first iteration, only the path sq -a S 2 — ► S3 has been 
taken (perhaps multiple times); this is reasonable since the tran- 
sitions along this path have significantly higher probabilities than 
their peers. The counter for the final PC along this path is 949 • 10 6 . 
This number is then subtracted from all the counters upward along 


jq: true [51 10 6 ] 



ji: x < 50 [ 50 - io 6 ] jo: x > 50 [io 6 ] 



sy. xfs 500 Ax > 50 [ 0 ] j 4 : x= 500 Ax > 50 [io 6 ] 

Figure 2: Symbolic execution tree: counters updated after one iter- 
ation 


jq: true [IO 6 ] 



ji : x < 50 [ 0 ] j 2 : x > 50 [to 6 ] 



J 3 : x^ 500 Ax > 50 [ 0 ] j 4 : x = 500 Ax > 50 [to 6 ] 

Figure 3: Symbolic execution tree: counters updated after two iter- 
ations; only one path left to explore. 

the path, yielding the updated counters in Figure □. A new iteration 
begins, where the sampling is now guided by the updated values of 
the PC counters. Note that in this iteration, the transition from j 2 to 
J 3 can never be taken, since its counter is 0. At the same time no- 
tice that the probability of following the path leading to the subtree 
containing the error (rooted at J 4 ) has increased from 1/10 3 in the 
first iteration to 1/51 in the second iteration. 

Assuming the more likely path jo — > Ji is sampled in the second 
iteration, the counters are updated according to the numbers shown 
in Figure □. In this last iteration, the only remaining path jo — > 
j 2 — > J4 is taken, leading to the exploration of the subtree containing 
the assert violation. Monte Carlo sampling without pruning would 
miss the path leading to the violation, unless an infeasibly large 
number of samples were taken. For example, after 20000 samples 
the error is still undetected. 

After each iteration we also combine the information obtained 
from an exact analysis of pruned paths and a Bayesian inference 
over sampled paths (over the pruned state space) to determine if 
enough evidence was collected about the probabilities of the events 
of interest. For simplicity we omit the details for this example but 
we will describe this at length later in the paper. 

4. STATISTICAL SYMBOLIC EXECU- 
TION 

We first describe Statistical Symbolic Execution, which com- 
putes an approximate solution to the probability of success (or fail- 
ure) of a program, based on sampling carefully chosen program 
paths. Informed Sampling will be described in the next section. 

The basic idea is to address the probability computation as a sta- 
tistical inference problem. First, a randomized sampling procedure 
generates a finite number of simulation runs and classifies each of 
them as either satisfying or violating a given property 0 (e.g., an 
assertion in the code). Second, suitable statistical methods are ap- 
plied to either estimate the probability of <j) from an analysis of the 
samples or to test a hypothesis about this probability. 

Similar techniques have already been explored in the literature 
on statistical model checking 033, E50, which typically phrase the 


statistical inference problem in the context of formal verification of 
probabilistic models, i.e., transition systems annotated with proba- 
bilities such as Markov Chains or Markov Decision Processes. 

We describe here how we adapted Bayesian statistical tech- 
niques 033] in the context of symbolic execution of Java programs, 
where no model is assumed and the probabilistic information is 
derived via model counting over the symbolic constraints (in com- 
bination with the usage profile). 

4.1 Monte Carlo Sampling of Symbolic Paths 

Typically, a Monte Carlo method defines the solution of the prob- 
lem as the parameter of a hypothetical population, and then gener- 
ates a random set of samples from which statistical estimates of this 
parameter can be obtained [O, Q3Q. 

In the context of symbolic execution, we define a sample as the 
simulation of one symbolic path. Whenever a branch is encoun- 
tered during such simulation, the decision to proceed along either 
of the alternative branches has to be taken according to the proba- 
bility of satisfying the corresponding branch conditions under the 
current usage profile. 

Every time a condition is encountered, the simulation has to de- 
cide whether to follow the true or the false branch. In particular let 
PCbranch be the path condition at the current state, and let c be the 
branching condition at that state. The path condition after taking 
the “then” branch is PC,b en = c /\ PC branch while the path condition 
after taking the “else” branch is PC e i se = ~>c A PCbranch- 

Similar to [II (Ij we associate to each of PCbranch • PC, ben- and 
PC e i se a counter of the number of points in the input domain satis- 
fying the path condition, identified by C(PCb ra nch)- C(PC,b en ) and 
C [PCeise ) , respectively. The first time a path condition PC is en- 
countered, its counter is initialized through model counting to the 
number of points of the input domain that satisfies PC'. C, = tK-PC). 
After its initialization, the value of each counter is stored and reused 
through the simulation process. 

We can compute the branch probabilities as follows: 

} _ jj( CA PC branch) _ C(PC,hen) 

P.hen- ^p Cbranch) C {PC branch) 

; _ tl( ~~*C A PCbranch ) _ C(PC, he n) 

^ #( PCbranch) ~ C (PCbranch) 

From Equation (Q) it is straightforward to note that f| (PC, ben) + 
it (PCeise) = tt( PCbranch ) an d Ptben T Pel sc — I ■ 

The decision whether to take the then or the else branch can now 
be taken randomly according to their respective probabilities p,hen 
and Peise- It remains to show that making the sampling choices 
locally at each branch is equivalent to making the choices over the 
complete PC, i.e., we do not introduce any statistical bias. This is 
implied by the following result: 

THEOREM 1. For a path with PC = ci A c 2 A . . . A c n and the 
branching conditions encountered in the given order, the path prob- 
ability given by Pr( PC) is equal to the product of the conditional 
probabilities at each branch given by Pr(c\ \ true) x Pr(c 2 | Ci) x 
Pr(c 3 | c 2 A cl) x ... x Pr(c n \ c „- 1 A . . . A ci). 

PROOF. From Section U..i. /I we have that Pr(PC) = 'yjyj where 
D is the complete finite domain and from Equation 0 we can rewrite 
the product of conditional probabilities as 

tt(ci) tt( c l A c l) „ tl( c l A c 2 Ac 3 ) ^ ;; ]1 (c't A ... A Cn ) 

K-D) #(ci) X tl(ciAc 2 ) #(ci A...Ac„_i) 

which is equal to = Pr(PC). □ 


4.2 Bayesian Inference and Stopping Criteria 

The samples generated by the Monte Carlo simulation described 
in the previous section need to be analyzed to estimate the prob- 
ability p of the program to satisfy a given property 0. Bayesian 
statistical techniques exploit Bayes' theorem to update the prior 
information on the probability p after every observed sample. The 
prior is a probability distribution that summarizes all the available 
information (including its lack) gathered through sampling [0, EG- 
As explained in |E5G, a prior for p can be formalized via the Beta 
distribution 3@{a,P) (see details in [JED, 1231 E20). By setting a and 
p it is possible to specify the initial assumption about p as follows. 
Assume the software engineer has an initial guess p about p, for 
example based on the analysis of previous versions of the software 
or on the quality of third-party components involved. One way of 
encoding such knowledge as a prior distribution is: 


a = p-Np 
p = (l-p)-N p 


(4) 


where N p > 1 represents the “trust” on p as if it was observed on N p 
samples. If no initial information is available, a “non-informative” 
prior can be used, such as 3S{\/2, 1/2) 023G- The meaning is that 
we give the same chance (1/2) to both possible outcomes, and we 
give small trust to it. We treat grey paths either optimistically or 
pessimistically, meaning that they are considered as either success 
or failure, as desired by the user. 

When new samples are gathered, they are used to update the 
prior, leading to the construction of the posterior distribution. In 
particular, if n samples have been collected with n s of them satisfy- 
ing 0, the parameters a' and p 1 of the posterior distribution will be 
computed as: 

a' =a + n s 

P' = P + n — n s () 

This information can then be used for statistical estimation or 
hypothesis testing as explained below. 


4.2.1 Bayesian Estimation 
We use Bayesian estimation 0ED, 1230 to compute a value that is 
close to p with high probability. More precisely, we compute an 
estimate p B such that: 

Pr(p B -e<p<p B + e)>§ (6) 


posterior distribution: 


A B = 


a' 

a' + P' 


(8) 


An estimate on the number of samples that is required to achieve 
the accuracy and confidence goals is discussed in [Mj. In general, 
this number is highly sensitive to the accuracy parameter, while in- 
creasing the prescribed confidence has a lower impact on the num- 
ber of samples. 


4.2.2 Bayesian Hypothesis Testing 
We use hypothesis testing as an alternative stopping criterion for 
termination. Hypothesis Testing 020, 1230 is a statistical method for 
deciding, with enough confidence, whether the unknown probabil- 
ity p is greater than a given threshold 0 (Hq : p>6). Alternatively, 
we may want to evaluate the complementary hypothesis Hi : p <6. 

Similar to estimation, hypothesis testing starts from prior knowl- 
edge and updates it with the information obtained through sam- 
pling until enough evidence is provided in support of either Hq or 
H\ . The procedure aims at estimating the odds for hypothesis Hq 
versus H\, which can be computed as follows 033]: 

Pr(H 0 \S) _ Pr(S\H 0 ) Pr(H 0 ) 

Pr(Hi\S) Pr(S\Hi) Pr(Hi) 


where S is the set of samples collected, and Pr(Ho) and Pr(H i) 
are the probability of the hypothesis to be true given the prior 
knowledge, respectively; Pr(Ho) = 1 — F%( a p){0) and Pr(H \ ) = 

l-Pr(tfo). 

The ratio Pr(S\Ho) / Pr(S\H\) is called a Bayes factor and can 
be used as a measure of relative confidence in Hq versus H\ 0231 
E50, i.e., it quantifies how many times Hq is more likely to be true 
than H\ given the evidence collected through sampling. The val- 
ues Pr(HQ\S) and Pr(H\\S) represent the probability of the two 
hypotheses to be true after samples S have been collected. Since 
all the information gathered from the samples is embedded in the 
posterior distribution, the latter is used to compute Pr(HQ\S) = 
1 — Fsg(a' p')(h) and Pr(Hi\S) = 1 — Pr(Ho\S). Thus, the Bayes 
factor B corresponding to the posterior odds for hypothesis Hq can 
be computed from Equation ( 0 ) after some algebraic simplifications 
as: 


Pr(S\H 0 ) _ Pr(H Q / 1 

Pr(S\Hi) Pr(H 0 ) ' \F m(afJl ^(e) 


(10) 


where £ > 0 is the accuracy and 0 < 8 < 1 is the confidence ; the 
accuracy determines how close the estimate has to be to the real 
unknown p and the confidence expresses how much this result can 
be trusted 530. 

Recalling that the posterior has a Beta distribution, with param- 
eters a 1 and P 1 , Equation ( 0 ) can be restated as: 

F m(a\ P') (A B + e) - F gg( jj/) (As -e)>8 (7) 

where Fog^ a , pij(-) is the cumulative distribution function of the 
posterior distribution, i.e., it computes the probability for a random 
variable distributed according to the posterior to assume a value 
less than or equal to the argument 0200 . 

From the correctness of Bayesian estimation 0O, E3D (i.e., it al- 
ways converges to the real value of p after enough samples are col- 
lected), Equation ( 0 ) can be used as a sequential stopping criterion 
to decide how many samples are needed to achieve the accuracy 
and confidence goals. 

If the estimation converges with the prescribed accuracy and 
confidence, the estimate p B is defined as the expected value of the 


If no preference among the two hypotheses is provided by the prior, 
e.g., when a non-informative prior is used, the initial value of the 
ratio Pr(Hi)/Pr(H 0 ) is 1. 

Equation (El) can be used to define a sequential stopping crite- 
rion. In particular, sampling can stop when the odds in favor of 
one of the hypotheses (the Bayes factor B) is greater than a given 
threshold T, i.e., when a relative confidence of at least T is obtained 
from data to support one of the hypotheses. A precise quantifica- 
tion of the number of samples needed to achieve convergence for 
Bayesian hypothesis testing is discussed in [BT1B.NJ. In general, hy- 
pothesis testing is faster than estimation, although its performance 
degrades when 0 is close to the (unknown) probability p |B1I|. 

5. INFORMED SAMPLING 

A weakness of statistical analysis is the large number of paths 
that may need to be explored and the slow convergence to a re- 
sult, within the desired confidence. To address this problem, we 
introduce here Informed Sampling (IS), an iterative technique that 
combines Monte Carlo sampling with pruning of already explored 


Algorithm 1: Statistical Symbolic Execution with Informed 
Sampling 


- 0 ; 


1 exploredD • 

2 successD <— 0; 

3 repeat 

4 I numSamples t— 0; 

5 numSuccess t— 0; 

6 successPCs t— {}; 

7 exploredPCs <— {}; 

8 repeat 

9 ;r 4— MonteCarloSample(); 

to let PC be the path condition of path n\ 

it numSamples <— numSamples + 1; 

12 exploredPCs <— exploredPCsU {PC}', 

13 if n |= </) then 

14 numSuccess 4— numSuccess + 1 ; 

is successPCs 4— successPCs U {PC}; 

16 end 

17 updatePriorQ; 

is until StopCombinedEstQV numSamples > IV/; 

19 exploredD 4— exploredD + ^(exploredPCs)-, 

20 successD 4— successD + ^(successPCs)', 

21 if StopCombinedEst( ) then 

22 | return; 

23 end 

24 pruneOutPaths {exploredPCs)', 

25 until exploredD = domainSize', 

26 return; 


paths. Furthermore, to obtain a precise estimation of the proba- 
bility of satisfying property <j>, IS combines information from two 
sources: the first is based on the exact probabilistic analysis (de- 
scribed in Section □) for the pruned paths and the second is based 
on Bayesian inference (as described in Section t4.il) for the sampled 
paths. We describe IS in more detail below. 

5.1 Algorithm 

Symbolic execution with Informed Sampling is described at a 
high level by Algorithm □. Assume for simplicity that we are in- 
terested in the success probability of the program with respect to a 
property ij> (the algorithm can also be applied to failure probabil- 
ity with only minor modifications). Nj is a pre-specified number 
of samples per iteration. Assume also that we treat the grey paths 
optimistically. 

The algorithm works through a number of iterations (lines 3-25). 
At each iteration, IS first tries to tackle the verification problem 
through Bayesian inference. For this task, it takes a pre-specified 
number of Monte Carlo samples (lines 8-18) as dictated by the 
conditional probabilities computed from the code. At each itera- 
tion, the algorithm keeps track of the following values: 

• numSamples counts the number of sampled paths 

• numSuccess counts the number of sampled paths that lead to 
success 

• exploredPCs stores the PCs of explored paths 

• successPCs stores the PCs of explored paths that lead to suc- 
cess 

The algorithm also computes exploredD and successD which 
keep count of total explored inputs and explored inputs that lead 
to success. These values are computed using model counting (lines 
19-20) and are used in the combined estimators as described below. 

As before we use as stopping criteria for sampling either 


Bayesian estimation or Hypothesis testing (high-level procedure 
StopCombinedEst() in lines 18 and 21). However for IS we use 
combined estimators that enhance the Bayesian inference with pre- 
cise information obtained from symbolic paths. If the (combined) 
Bayesian estimator converges to the desired confidence or if the 
(combined) estimated probability satisfies the hypothesis, this re- 
sult is reported and the analysis stops. The iterative process can 
also terminate when the whole domain was analyzed (line 25). 

After each iteration the symbolic paths explored so far are pruned 
out of the execution tree (line 24) and analyzed using the exact 
method (Section □); the results are used to build the combined es- 
timator. This improves the efficiency of the inference procedure 
because it accounts for all the information obtainable from the path 
conditions of explored paths. Indeed, each sampled path has a 
path condition which is used in the exact analysis to quantify how 
many input values from the domain will follow the execution along 
that path. For example, referring to Figure □, the symbolic path 
s 0 — ► ^2 H t £3 accounts for more concrete program paths than the 
path jo —> ii; however this information is ignored by the purely 
statistical inference, which treats symbolic paths as concrete paths. 

Pruning using Counters. 

Recall that for each path condition PC we maintain a counter 
C(PC) to count the number of solutions. Initially these counters 
are computed using off-the-shelf quantification procedures such as 
LattE. At each iteration, IS performs sampling, as guided by the PC 
counters (see Section ED). For each sampled (non-duplicate) path, 
with final PC counter n, IS updates all the counters for the prefixes 
of PC along the path (to the root of the symbolic execution tree) 
by subtracting n, and a new iteration starts (with the updated coun- 
ters). Thus, for each pruned PC only a small number of arithmetic 
operations is required, with no significant impact on the overall 
computation time. 

At the end of each iteration, the counters keep track of the num- 
ber of inputs that need to be sampled to follow that path. If a 
counter becomes 0 it means that the sub-tree rooted at that node 
has been fully explored, and it does not need to be sampled again. 
Therefore we can safely prune it from the search space. If the 
counter of the root node becomes 0 the analysis stops, because the 
whole domain was analyzed exactly. 

After each pruning, exact information is obtained for a fraction 
of the input domain. This fraction needs no longer to be considered 
for statistical inference, allowing the latter to focus on the remain- 
ing part of the domain. Furthermore, the overall confidence in the 
result grows, since there is no uncertainty about the fraction of the 
domain analyzed exactly. 

Estimation with IS. 

The combined estimator, denoted here as p, is defined through 
the mixture of an exact estimator, denoted Pe, and a Bayesian es- 
timator, denoted pg. E refers to the inputs that follow the paths 
explored in previous iterations of IS (and can therefore be analyzed 
Exactly), while B refers to the inputs that have not been explored 
yet (and therefore can only be used in Bayesian estimation). A hat 
(“~”) denotes an approximate value. 

For the input points that have already been explored, we can 
compute the exact probability Pe- Recall that successD denotes 
the number of input points corresponding to the pruned successful 
paths and exploredD is the total number of points corresponding to 
pruned paths. Then: 


Pe 


successD 

exploredD 


( 11 ) 


and or the rest of the input domain we have at each iteration just 
the Bayesian estimator: 

„ numSuccess + a 

M B= ? 7 — . a ( 12 ) 

numbamples + a + p 

where for both a and /3 we use 1/2 as default. By the law of total 
probability (Definition ED we can combine the exact and Bayesian 
estimators: 


For the portion of the domain analyzed exhaustively, g B is by 
definition the actual measure it estimates. Thus it is trivially un- 
biased and consistent (indeed the variance of a number is always 
zero). For the portion of the domain subject to statistical estima- 
tion, we adopt the standard Bayesian estimator for the parameter of 
a Bernoulli distribution. Proofs that it is unbiased and consistent 
can be found, for example, in OH E3, E5Q. Thus, the combined 
estimator is in turn unbiased and consistent. 


P = /e)- Pb+Te ■ Pe (13) 

where is the fraction of the domain that has been pruned out 
up to the previous iteration, i.e., explored D/^D). The number of 
samples to take at each iteration is decided according to a sequen- 
tial stopping criteria, and it is bounded by the maximum value Nj 
provided by the user. 

Hypothesis Testing with IS. 

For hypothesis testing recall that we base the decision on the 
posterior odds of the hypothesis Hq : Pr(P |= <j>) > 6 versus H \ : 
Pr(P |= A) < 6. For IS we compute the posterior odds based on a 
combined estimator similar to the one described in Equation (D): 

A" 0 = (!-/£)■ £"“+/£• Me 0 (14) 


Termination. 

If IS explores the whole domain, that is exploredD = tt(£>), the 
process terminates with the same results as for the exact analy- 
sis. Since at each iteration the number of samples to collect for 
Bayesian inference is greater than zero, IS is guaranteed to termi- 
nate, in the worst case, when the whole domain has been analyzed 
exactly. (Note that we assumed the domain is finite.) 

Faster Convergence for Bayesian Estimation. 

A benefit of mixing the Bayesian estimator g B with g B is a faster 
convergence to the criterion of Equation (H). Indeed, if an input 
falls in the portion of the domain analyzed exactly, our estimate is 
perfectly accurate (with confidence 1) by definition. Otherwise it 
will provide confidence 8 : 


where pf a is the Bayesian posterior estimator defined in Sec- 
tion 14 2 21 for the probability Pr(Ho\S). S is the set of samples 
taken during the current iteration (i.e., 1 — F^ a , p>)(9), where 
a' = a + numSuccess and ft 1 = |3 + numSam pi es — numSuccess are 
the parameters of the posterior Beta distribution of the Bayesian es- 
timator). fig 0 is equal to 1 if the result g B of the partial exact anal- 
ysis is greater than or equal to 0, and equal to 0 otherwise; is the 
fraction of the domain that has been pruned out up to the previous 
iteration, as described in the previous section. 


Early Termination. 

We further enhance the IS procedure to check for additional suf- 
ficient termination conditions determined by the partial exact anal- 
ysis of pruned paths. Indeed, the actual value of g is by definition 
in the interval: 


successD 

nPT-* 1 - 


failD 

m 


(15) 


where failD = exploredD — successD. 

We use these lower and upper bounds to test against the hy- 
pothesis and decide early termination of the IS procedure. Indeed, 
if successD /^{D) > 0 the hypothesis is necessarily true; while if 
1 — failD /(!(£>) < 0 the hypothesis is necessarily false. In both 
cases we stop the iterative process and return the result to the user. 
This check is performed in StopCombinedEstf ). 


5.2 Discussion 


Combined Estimators are Unbiased and Consistent. 

The construction of the combined estimator of Equation (O) is 
an application of stratified sampling, where the population (the in- 
put domain) is partitioned into disjoint subsets to be analyzed inde- 
pendently; the local results are then linearly composed, assigning 
each one a weight proportional to the size of the corresponding sub- 
set [EO. An estimator obtained through stratified sampling is unbi- 
ased (i.e., its expected value converges to the measure it estimates) 
and consistent (i.e., its variance converges to 0 when the number of 
samples goes to °°) if the local estimators used for each subset of 
the partition are unbiased and consistent [ 0 . 


S = (t-f E )-8 B +fE (16) 

Thus, to meet the prescribed confidence 5 as a whole, the Bayesian 
estimator is required to just satisfy the relaxed confidence S B . 

(17) 

1 ~ JE 

During the first iteration, when /e = 0, S B needs to satisfy the 
original convergence criterion of Bayesian estimation (i.e., the pre- 
scribed 8). However, with each iteration, /e increases, thus relax- 
ing the constraint on S B . 

Faster Convergence for Hypothesis Testing. 

As for Bayesian hypothesis testing, the process terminates as 
soon as the odds in favor of Hq overcome those in favor of H\ 
by a factor T decided by the user. To understand the benefit 
in terms of the convergence rate provided by the IS estimator of 
Equation (HU), we need to consider the ratio of the posterior odds 
Pr{Ho\S)/Pr{H\\S). If Hq is actually true, Pi\Hq\S) will converge 
to 1 (and consequently Pr(H\\S) to 0) the more samples are col- 
lected. The other way around, if Hq is false Pr(Ho\S) will con- 
verge to 0 (and Pr(H\\S) to 1). The convergence of the estimator 
g H ° can be evaluated again considering f B . Since after each iter- 
ation /e grows, the room for the uncertainty derived from the use 
of Bayesian estimation is always bounded by a decreasing factor 
1 — /e- The more execution paths are pruned out and analyzed ex- 
actly, the more such uncertainty is reduced, usually speeding up the 
convergence of the combined estimator. 

Detecting Errors with Random Exploration. 

The iterative pruning of the input domain increases the chances 
of random exploration to detect errors. To show this, let us consider 
an error path with path condition PC R . Let B' represent the set of 
the paths targeted by random sampling during iteration i, and let 
B ' + 1 represent the set of paths targeted by sampling during iteration 
( + 1 . If the error path is not detected at iteration i, we will show 
that the probability of catching PC R is higher at iteration i+ 1. 

Let us assume, for simplicity, that only one path is sampled per 
iteration (the worst case for our proof). The probability of sampling 


PC R at iteration i is Pr(PCR\B‘) . If it is sampled, then the error has 
been detected. Otherwise a sampled path with condition PC' is 
removed from B' . Since B' +1 = B' — PC' it follows that at iteration 
i + 1 , the probability Pr(PC R \ B ' +1 ) of catching PC R is higher than 
in the previous iteration: 


Pr(PC R \B‘) = Pr(PC R \PC) ■ Pr(PC) + Pr(PC R \B i+1 ) ■ Pr(B i+1 ) (18) 

Note that Pr(PC R \PC') = 0 because we assumed that the sampled 
path with PC' was not the error path with PC R , it follows that: 


Pr(PC R \B i+1 ) 


Pr(PC R \B i ) 

Pr(B i+1 ) 


(19) 


Again, assuming that PC R has not been detected yet, necessarily 
B ,+l f 0 and thus Pr(B ,+l ) > 0 . The example in Section □ illus- 
trates this phenomenon: the error is very hard to detect with purely 
random exploration but it can be easily detected with IS. 


Number of Samples; Incremental Symbolic Execution. 

The maximum number of samples to take in each iteration of IS 
allows us to select different operation modes for the algorithm. If 
a very large number of samples are allowed during each iteration, 
IS reduces to Bayesian inference as described in Section E2L On 
the other hand, if Nj = 1 the impact of the Bayesian estimation 
becomes negligible, since it will almost surely not converge after a 
single sample, making IS perform an incremental exact analysis by 
selecting, pruning, and analyzing one symbolic path per iteration. 

Thus IS can be used to improve on “classical'’ symbolic execu- 
tion by providing for a new kind of incremental analysis where the 
next path to be analyzed is selected according to the Monte Carlo 
Sampling described in Section 14. It In this way IS will likely cover 
the most probable paths first, computing also the fraction of the 
domain these paths cover. This results in an “any time” approach 
where it is possible to interrupt the execution when enough of the 
input domain has been covered, even if the analysis cannot be ex- 
haustively completed within a reasonable time. 

Values of Nj between the two extremes trade off the effort 
Bayesian estimation is allowed to take to converge during a single 
iteration with the number of iterations required to converge. Choos- 
ing a good value for Nj depends on the specific problem. We will 
discuss its choice for several applications in Section 0. Another op- 
tion is to “adapt” the value of Nj with the number of iterations, e.g., 
by starting small to quickly prune out paths with high likelihood of 
execution and gradually increasing the value of A/ to stress-test the 
parts of the state space that have a small likelihood of execution. 


False Positives or Negatives. 

Statistical hypothesis testing, being a randomized procedure op- 
erating on a limited number of samples, may produce false neg- 
atives or positives, i.e., it may reject a hypothesis that is actually 
true and vice versa. This problem can occur especially when the 
analyzed programs are very large and the probability of success or 
failure is close to the extremes (0 or 1) |B41|. In the next section we 
show an instance of the problem. For Bayesian hypothesis testing, 
it has been proved that the probability of obtaining spurious results 
is bounded by 1 /T [ED, where T is the threshold set by the user 
(see Section B ’ 'l l 

For IS, pruning reduces the possibility of spurious results since 
it limits the possibility of wrong conclusions to the fraction of the 
domain analyzed with the Bayesian estimator (1 — /);). Also note 
that the sufficient conditions that we added to IS, for early termi- 
nation with hypothesis testing, do not suffer from incorrect results 
because they rely on exact methods. Thus they improve the quality 


of the overall approach since if IS terminates due to the sufficient 
conditions, its results are always correct. 

6. EXPERIENCE 

We implemented the statistical symbolic execution techniques 
described in this paper in the context of SPF Lt22lJ, an open-source 
toolset. We plan to make our tool available for download. Sampling 
is parallelized using a map-reduce algorithm. Path counters are 
shared and reused in subsequent sampling phases. 

In this section we compare IS with both an exhaustive analysis 
and a purely statistical approach. We report on the analysis of the 
following software artifacts: 

OAE: the Onboard Abort Executive (OAE) LEJJj software compo- 
nent manages the Crew Exploration Vehicle’s ascent abort handling 
developed at NASA. OAE has 1400 LOC, 36 input variables rang- 
ing over large domains, and fairly complicated logic encoding the 
flight rules (a path condition can have approx. 60 constraints). We 
are interested in the probability of the OAE not raising a mission 
abort command. 

MER : models a component of the flight software for JPL’s Mars 
Exploration Rovers (MER) [HD; it consists of a resource arbiter and 
two user components competing for five resources. MER has 4697 
LOC (including the Polyglot framework). The software has an er- 
ror (see 510) and is driven by input test sequences. We analyze two 
versions: MER (small) for sequence length 8 and MER (large) for 
sequence length 20; the latter cannot be analyzed fully with sym- 
bolic execution because of the huge number of execution paths. 

Sorting: an implementation of Insertion sort. We calculate the 
probability of sorting an array of size n in exactly n(n — 1 )/2 com- 
parisons, i.e., the worst case. A large number of paths need to be 
analyzed (n!), but only 1 path leads to the worst case. Despite being 
a simple algorithm, this example is very challenging for statistical 
techniques due to the low probability of hitting any failure. We 
analyze a version for n = 7. 

Windy : a standard example in the reinforcement learning com- 
munity that involves a robot moving in a grid from a start to a goal 
state. A crosswind can blow the robot off course and an added 
weight to the robot counter-balances that. We analyze two versions: 
Windy (small) has a 5 x 4 grid and solutions limited to 5 moves, and 
Windy (large) has a 9 x 4 grid and 12 moves. The latter cannot be 
analyzed exhaustively with symbolic execution because of the very 
large number of paths the robot may follow. We consider reaching 
the goal state in the specified number of moves as a success. 

OAE was analyzed on a Red Hat Linux 64bit machine with 4Gb 
of memory and a 2.8GHz Intel i7 CPU. The other software was an- 
alyzed on an Ubuntu Server 12.04.4 LTS 64bit with 16Gb of mem- 
ory and a 3.10GHz quad-core Intel Xeon CPU E31220. 

Estimation. Table □ shows some of our results for the proba- 
bility estimation problem. 5 and £ represent the target confidence 
and accuracy, Nj is the number of samples per iterations, Iter is the 
number of iterations completed during analysis, Estimate is the re- 
sult computed, and Time is time consumption in milliseconds. For 
all the examples in this table we assume a uniform usage profile for 
the inputs and we treat grey paths optimistically. 

5=1 denotes that IS has been used for incremental exact anal- 
ysis (thus computing the actual success probability without uncer- 
tainty), while Nj = 100000 means that the analysis was purely sta- 
tistical (no IS). There are several observations to make about these 
numbers: 

• For OAE, e and Nj do not seem to play a role in the number of 
iterations required, or the time consumption. This is because af- 
ter the first iteration, even with 100 samples, more than 99.8% of 
the domain is pruned out, and IS achieves the required confidence 


Table 1: Estimation results (* means non-convergence, ** means 
exhaustive analysis) 
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quickly. Indeed, OAE has a few 

“success behavior” paths account- 

ing for most of the executions, while the abort paths share a small 
probability of being taken under the uniform profile (we will later 


report on a different mission profile). Thanks to Monte Carlo sam- 
pling, the former are very likely to be sampled, and then pruned, 
first. IS is dramatically faster than the purely statistical approach, 
and its estimate is also closer to the true value. 

MER (small) has several execution paths occurring for roughly 
the same number of inputs. IS needs more iterations to prune 
them out and achieve the high accuracy and confidence goals (un- 
like in the OAE case). However, due to the small number of paths 
(122), after 9 iterations at least 99% of the domain is covered, 
pushing the convergence of the IS estimator, which outperforms 
the Bayesian estimator. Notice also that the latter does not reach 
the required confidence for such high accuracy: after 100000 sam- 
ples it reaches a confidence of only .5346 and .0058, for e equal 
to 10~ 3 and 10~ 5 , respectively. 

For Sorting, only Nj significantly influences the number of iter- 
ations. This reflects the fact that - initially - the 5040 paths are 
equally likely, and so we expect the impact of pruning in IS to 
be small. This scenario is particularly suitable to Bayesian esti- 
mation, which converges for e = 10“ 3 after about 2500 samples. 
Since we limited Nj to smaller values, IS was not able to achieve 
convergence by its statistical component until pruning covered 
a large portion of the domain. When the accuracy is raised to 
£ = FT 5 , the Bayesian estimator is not able to converge within 
100000 samples (final confidence ~ 0.864), while for IS increas- 
ing the accuracy does not require higher overhead, allowing it 
to converge faster than Bayesian. For this problem, a higher Nj 
would be a reasonable choice, especially for low accuracy. 

Windy is similar to Sorting, since there are many paths, all with 


comparable probability, and only a few of them are classified 
as success. However, while for Sorting the Bayesian estimator 
quickly converged for accuracy ICC 3 without observing any fail- 
ure, in this case the probability of success is high enough to allow 
sampling both types of path. This increases the variance of the 
sample, slowing down the statistical estimator. On the other hand, 
for IS, thanks to pruning, as soon as the few success paths are col- 
lected they are pruned out, reducing the variance of the samples 
of subsequent iterations and speeding up convergence. 

In summary, IS is particularly effective for problems where a 
subset of the execution paths accounts for a large portion of the 
inputs. In this case, such paths are likely to be pruned out af- 
ter a few iterations increasing the confidence on the partial result. 
Also, IS outperforms statistical methods when high accuracy is 
required. Finally, if an exact analysis is required for a problem 
that would require too much memory to be analyzed with previous 
approaches [BO, IS can analyze them incrementally, producing in- 
termediate results with quantified confidence after each iteration, 
though usually taking longer time. 

Hypothesis testing. The results for hypothesis testing are shown 
in Table □. 0 and T represent the hypothesis (Hq : Pr(P |= <p) > 0) 
and the confidence threshold to accept or reject Ho, and Result is 
the result computed (whether or not the hypothesis holds), while 
the meanings of the other columns are the same as before. Once 
again, we assume a uniform usage profile. 

Table 2: Hypothesis testing results (* denotes convergence for suf- 
ficient exact conditions, ** denotes a false positive/negative) 
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to 5 

1000 

1 

true 

110,437 


0.004083625 

to 5 

100 

1 

false 

10,961 

T3 

0.004083625 

to 5 

1000 

3 

false* 

210,286 

S 

0.003073625 

10 5 

100000 

- 

true 

627,864 

0.004083625 

10 5 

100000 

- 

- 

6,257,120 

■ choices of 0 

are values close to the actual 

success f 


bilities, obtained by estimation and as given in Table □. As ex- 
pected |ti9j, hypothesis testing is usually faster than estimation. 
However, when 0 is very close to the actual probability of success, 
Bayesian methods fail to converge within a reasonable amount of 
time (results marked with — ). IS responds to this situation by re- 
quiring more iterations (more rounds of sampling/pruning). Com- 
pare, for example, the cases with 0 = .9 and 0 = .74999 for MER 
(small). IS generally performs better than a pure Bayesian testing 
(and for some smaller cases the sampling procedure covered, by 
chance, the full domain after a just few iterations, producing an ex- 


act result). Interestingly, in the first experiment reported for Windy 
(small) with Nj = 100 we obtained a false negative result. In this 
case the Bayesian component of IS converged to a false decision af- 
ter the 100 samples produced, by chance, 100 failures. Increasing 
the number of samples (V/ was enough to avoid this error. 


Table 3: Hypothesis testing results where “classical'’ symbolic ex- 
ecution runs out of memory (* denotes convergence for sufficient 
exact conditions) 


e 

T 

N, 

Iter 

Result 

Time 

0.2 

to 3 

100 

1 

true 

55,004 

0.2 

to 5 

1000 

1 

true 

287,829 

0.35 

to 5 

100 

15 

false* 

913,109 

0.35 

to 5 

1000 

1 

true 

287,372 

e 

T 

N, 

Iter 

Result 

Time 

10* 

to 5 

100 

1 

false 

30.000 

to 1 

to 5 

1000 

1 

false 

61,968 

10- 3 

to 5 

100 

174 

true 

6,836,523 

to- 3 

to 5 

1000 

7 

true 

804,979 

10- 5 

to 5 

100 

5 

true 

146,986 

10- 5 

to 5 

1000 

1 

true 

82,998 


Intractable “classic” symbolic execution. Table 0 shows the 
results for a second set of experiments where we ran the techniques 
on the larger examples for which “classical” symbolic execution is 
intractable. We show results for the most efficient technique from 
the smaller cases, i.e., IS for hypothesis testing. There, IS was 
able to converge to a decision within a reasonable amount of time. 
Nevertheless, the large number of execution paths of these cases 
led for MER (Large) with 0 close to the actual success probability 
to a false positive result for 0 = .35 and Nj = 1000; we know it is 
a false positive because with N/ — 100 we obtained termination for 
a sufficient condition check. As already discussed, a false positive 
result is possible for statistical testing. IS can mitigate this issue 
by leveraging its exact analysis component, as for the case of N; = 
100, although, in some cases, even 100 could be enough to make 
the Bayesian component of IS converge to the wrong conclusion, 
and an even smaller value for Nj might be required. 

Usage profiles. We briefly mention the impact of the usage 
profiles on the probability of satisfying a target property. We an- 
alyzed OAE with a different usage profile, where one input vari- 
able ( thrust ) has a Gaussian (normal) distribution. The Gaussian 
distribution was approximated by discretizing the domain of thrust 
of into 5 segments, which led to 5 usage scenarios with different 
probabilities [0]. 

Under this usage profile, the density of inputs following the “nor- 
mal behavior” paths is reduced, requiring more rounds of prun- 
ing for IS estimation to converge, even accuracy as low as 10 _1 . 
This results in longer computation time, though still within reason- 
able ranges. For example, IS with Bayesian estimation for confi- 
dence 0.975 took approx. 50,000 ms in 5 iterations (with 100 or 
1000 samples per iteration) while for confidence 0.99 it took ap- 
prox. 167,000 ms in 6 iterations. 

The source code for all the examples (except OAE) and more 
experimental data are available from 08D- 


7. RELATED WORK 

Our work is related to statistical model checking (SMC) [E2D, 
also formulated as a statistical hypothesis testing problem veri- 
fied through Wald’s sequential probability ratio test (SPRT) BZ9]. 
SPRT does not fix the required number of samples a priori but 
uses a sequential approach to decide after each sample whether 
to stop or continue. A different hypothesis testing criterion has 
been proposed in [ED, where the size of the sample set is auto- 


matically increased until it allows for satisfying the convergence 
criteria. In DOB. SMC has been formulated as an estimation prob- 
lem, with the number of samples fixed a priori by means of the 
Chernoff and Hoeffding bound LlidlJ. Other approaches for decid- 
ing the number of samples have been discussed in EL 0]. Some 
of these approaches have been implemented in well-known proba- 
bilistic model checkers BELCH]. 

In our work we combined Bayesian inference techniques with 
exact analysis through the IS technique, which is shown to provide 
better performance than the pure Bayesian analysis. 

A recent approach related to ours [11 9IJ provides automated re- 
liability estimation over partial systematic explorations applied to 
models. The approach first performs sampling over the model and 
then applies invariant inference over the samples. The inferred in- 
variant characterizes a partial model which is then exhaustively ex- 
plored using (exact) probabilistic model checking, obtaining better 
results than (full model) probabilistic and statistical model check- 
ing for system models. 

The techniques we propose are different. Indeed we focus on the 
use of symbolic execution to analyze software from its source code, 
while [ I19IJ focuses on Markov chain models analyzed through prob- 
abilistic model checking. The samples in [110J are used to produce 
an approximate simplified model to be analyzed, while instead we 
use an iterative process that prunes the execution tree and guides 
the sampling towards low-probability paths. 

We proposed several techniques for the probabilistic analysis of 
programs Euanu]. The approaches in [0, Eli] can only perform ex- 
act analysis that requires all paths to be evaluated. The work in B2D 
addresses the approximate analysis of non-linear constraints; we 
can apply the techniques described here also in that domain, using 
the quantification procedure from [ 0 ] instead of model counting. 
Another approximate analysis for programs is proposed in [ED; 
that also uses sampling of symbolic paths (but no incremental or 
informed sampling as we do here) and gives bounds on the prob- 
ability of events of interest in a program. In more recent work 
we study statistical techniques that target specifically programs that 
have nondeterminism (for example due to concurrency) LlIEj. The 
work also uses hypothesis testing (a simpler form than here ) but its 
main focus is on deriving optimal schedulers, with the best tech- 
nique using reinforcement learning for the most promising sched- 
uler moves. 

Our work shares similar goals with guided testing techniques, 
which provide heuristics to guide the exploration of a program to- 
wards “interesting” paths (to increase coverage or to uncover er- 
rors), e.g., [0, ED and many other works. However such techniques 
do not provide statistical guarantees as we do here. 

8. CONCLUSIONS 

We described statistical symbolic execution, for the analysis of 
software implementations. The technique uses a randomized sam- 
pling of symbolic paths with Bayesian estimation and hypothesis 
testing. We also proposed Informed Sampling, an iterative ap- 
proach that first explores the paths with high statistical significance, 
prunes them from the state space and then keeps guiding the exe- 
cution along less likely paths. Informed sampling combines sta- 
tistical information from sampling with exact analysis for pruned 
paths leading to provably improved convergence of the statistical 
analysis. The techniques have been implemented in the context of 
Symbolic PathFinder and have been shown to be effective for the 
analysis of Java programs. In the future we plan to perform further 
evaluations and to investigate applications in statistical information 
flow analysis. We also plan an in-depth study on probability com- 
putations for programs with structured inputs. 
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